Reliable Attribute Selection Based on Random Forest (RASER)
نویسندگان
چکیده
Feature selection has become one of the most active research areas in the field of data mining. It allows removing redundant and irrelevant data sets of large size. Furthermore, there are several methods in the literature for selecting attributes. In this article, a new multi-objective method is proposed to select relevant and non-redundant features. Our proposed feature selection method is divided into three stages: The first step computes the feature relevance value based on random forests. The second step, computes the dissimilarity matrix representing the dependence between the features of our training datasets, and transform it into a complete graph whose nodes represent features and edges represent the values of dissimilarities between them. The last step is for the optimization in which a multi-objective optimization algorithm is applied. The proposed method is applied on many datasets to find the most relevant and non-redundant features and the performance of the proposed method is compared with that of the popular MBEGA, mRMR (MIQ) and mRMR (MID).
منابع مشابه
A Random Forest Classifier based on Genetic Algorithm for Cardiovascular Diseases Diagnosis (RESEARCH NOTE)
Machine learning-based classification techniques provide support for the decision making process in the field of healthcare, especially in disease diagnosis, prognosis and screening. Healthcare datasets are voluminous in nature and their high dimensionality problem comprises in terms of slower learning rate and higher computational cost. Feature selection is expected to deal with the high dimen...
متن کاملMachine Learning based Approach for protein Function Prediction using Sequence Derived Properties
Protein function prediction is an important and challenging field in Bioinformatics. There are various machine learning based approaches have been proposed to predict the protein functions using sequence derived properties. In this paper 857 sequence-derived features such as amino acid composition, dipeptide composition, correlation, composition, transition and distribution and pseudo amino aci...
متن کاملInvestigation of Random Forest Performance with Cancer Microarray Data
The diagnosis of cancer type based on microarray data offers hope that cancer classification can be highly accurate for clinicians to choose the most appropriate forms of treatment with it. Due to several inherent characteristics associated with microarray data, accurate diagnosis has been an active research topic attracting tremendous research interests in machine learning community. In this p...
متن کاملGRASP Forest: A New Ensemble Method for Trees
This paper proposes a method for constructing ensembles of decision trees: GRASP Forest. This method uses the metaheuristic GRASP, usually used in optimization problems, to increase the diversity of the ensemble. While Random Forest increases the diversity by randomly choosing a subset of attributes in each tree node, GRASP Forest takes into account all the attributes, the source of randomness ...
متن کاملAn Optimization Rough Set Boundary Region based Random Forest Classifier
Machine learning is a concerned with the design and development of algorithms. Machine learning is a programming approach to computers to achieve optimization .Classification is the prediction approach in data mining techniques. Decision tree algorithm is the most common classifier to build tree because of it is easier to implement and understand. Attribute selection is a concept by which we wa...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2016